Recurrent Models of Visual Attention

نویسندگان

  • Volodymyr Mnih
  • Nicolas Heess
  • Alex Graves
  • Koray Kavukcuoglu
چکیده

Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Just Noticeable Difference Estimation Using Visual Saliency in Images

Due to some physiological and physical limitations in the brain and the eye, the human visual system (HVS) is unable to perceive some changes in the visual signal whose range is lower than a certain threshold so-called just-noticeable distortion (JND) threshold. Visual attention (VA) provides a mechanism for selection of particular aspects of a visual scene so as to reduce the computational loa...

متن کامل

Dual Recurrent Attention Units for Visual Question Answering

We propose an architecture for VQA which utilizes recurrent layers to generate visual and textual attention. The memory characteristic of the proposed recurrent attention units offers a rich joint embedding of visual and textual features and enables the model to reason relations between several parts of the image and question. Our single model outperforms the first place winner on the VQA 1.0 d...

متن کامل

Modeling Utterance-driven Visual Attention during Situated Comprehension

Evidence from behavioral studies demonstrates that spoken language guides attention in a related visual scene and that attended scene information can influence the comprehension process. Here we model sentence comprehension within visual contexts. A recurrent neural network is trained to associate the linguistic input with the visual scene and to produce the interpretation of the described even...

متن کامل

Visual Attention Models of Object Counting

We develop a sequential learning model using a recurrent neural network architecture and reinforcement learning to recognize and count objects in images. Simple feedforward neural networks perform well on this task when trained using backpropagation; however, convolutional neural networks are computationally expensive and results are less certain when the image input has imperfect resolution ou...

متن کامل

Tuning curve shift by attention modulation in cortical neurons: a computational study of its mechanisms.

Physiological studies of visual attention have demonstrated that focusing attention near a visual cortical neuron's receptive field (RF) results in enhanced evoked activity and RF shift. In this work, we explored the mechanisms of attention induced RF shifts in cortical network models that receive an attentional 'spotlight'. Our main results are threefold. First, whereas a 'spotlight' input alw...

متن کامل

A Hierarchical Generative Model of Recurrent Object-Based Attention in the Visual Cortex

In line with recent work exploring Deep Boltzmann Machines (DBMs) as models of cortical processing, we demonstrate the potential of DBMs as models of object-based attention, combining generative principles with attentional ones. We show: (1) How inference in DBMs can be related qualitatively to theories of attentional recurrent processing in the visual cortex; (2) that deepness and topographic ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014